Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] adding support for range predicate #9445

Merged
merged 7 commits into from
Sep 23, 2022

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Sep 21, 2022

  • adding rules to decompose range predicate into OR joined simple comparison.
  • if the look up is point or complemented points, we should still use IN / NOT IN
  • if the search result in a constant value (true of false) it might trigger logical value literal node, which is also supported here.

@walterddr walterddr changed the title [stash] adding support for range [multistage] adding support for range predicate Sep 21, 2022
@walterddr walterddr marked this pull request as ready for review September 21, 2022 21:21
@@ -200,6 +200,13 @@ private Object[][] provideTestSql() {
new Object[]{"SELECT a.col1, b.col2 FROM a JOIN b ON a.col1 = b.col1 "
+ " WHERE a.col1 IN ('foo') AND b.col2 NOT IN ('')"},

// Range conditions with continuous and non-continuous range.
new Object[]{"SELECT a.col1, b.col2 FROM a JOIN b ON a.col1 = b.col1 "
+ " WHERE a.col3 IN (1, 2, 3) OR (a.col3 > 10 AND a.col3 < 50)"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check the case where someone tries an impossible range condition or conditions with overlaps? I was running into an issue with those edge-cases in my PR. Examples:

(a.col3 > 10 AND a.col3 < 20) AND (a.col3 > 30 AND a.col3 < 40) # Impossible range
(a.col3 > 10 AND a.col3 < 20) AND (a.col3 > 15 AND a.col3 < 25) # Overlapping range

Copy link
Contributor Author

@walterddr walterddr Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was planning to split that into a separate PR but yeah sure let me add that to this one.

Copy link
Contributor

@siddharthteotia siddharthteotia Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is col IN (<single value>) getting rewritten to col = <val> ?

FWIW, in the current engine we do this rewrite but not really worth it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single IN yes will be rewrite

@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2022

Codecov Report

Merging #9445 (9a268b1) into master (b6f3331) will decrease coverage by 2.76%.
The diff coverage is 46.15%.

@@             Coverage Diff              @@
##             master    #9445      +/-   ##
============================================
- Coverage     69.87%   67.10%   -2.77%     
- Complexity     4790     4912     +122     
============================================
  Files          1890     1412     -478     
  Lines        100961    73743   -27218     
  Branches      15353    11785    -3568     
============================================
- Hits          70543    49484   -21059     
+ Misses        25451    20676    -4775     
+ Partials       4967     3583    -1384     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.10% <46.15%> (-0.02%) ⬇️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pinot/query/planner/stage/StageNodeSerDeUtils.java 91.66% <0.00%> (-2.62%) ⬇️
...rg/apache/pinot/query/planner/stage/ValueNode.java 30.76% <30.76%> (ø)
...not/query/planner/logical/RelToStageConverter.java 78.18% <50.00%> (+1.25%) ⬆️
...ot/query/runtime/executor/WorkerQueryExecutor.java 95.38% <50.00%> (+0.14%) ⬆️
...t/query/runtime/operator/LiteralValueOperator.java 57.89% <57.89%> (ø)
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
... and 748 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@siddharthteotia siddharthteotia added the multi-stage Related to the multi-stage query engine label Sep 22, 2022
super(stageId);
}

public ValueNode(int currentStageId, DataSchema dataSchema,
Copy link
Contributor

@siddharthteotia siddharthteotia Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion -- for every new Operator / StageNode introduced, I think we should add a corresponding query compilation test to validate the plan, expressions etc. In the current engine, we have done that for quite some time in CalciteSqlCompilerTest.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one yes. ValueNode is a special node generated by calcite when the query evaluates back to a constant (regardless of underlying table)

value node replaces table scan as a root.

+ " WHERE a.col3 IN (1, 2, 3) OR (a.col3 > 10 AND a.col3 < 50)"},

new Object[]{"SELECT col1, SUM(col3) FROM a WHERE a.col3 BETWEEN 23 AND 36 "
+ " GROUP BY col1 HAVING SUM(col3) > 10.0 AND MIN(col3) <> 123 AND MAX(col3) BETWEEN 10 AND 20"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current engine non-literals (function expressions) on RHS get rewritten by moving to LHS. Do we support that here ?

WHERE a > b + 20 -> WHERE a - b > 20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is not done by the range predicate but by the compilation on server.

@walterddr walterddr merged commit a07f757 into apache:master Sep 23, 2022
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
* adding rules to decompose range predicate into OR joined simple comparison.
* if the look up is point or complemented points, we should still use IN / NOT IN
* if the search result in a constant value (true of false) it might trigger logical value literal node, which is also supported here.

Co-authored-by: Rong Rong <[email protected]>
@walterddr walterddr deleted the pr_add_range branch December 6, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants